Continuous-speech phone recognition from ultrasound and optical images of the tongue and lips
نویسندگان
چکیده
The article describes a video-only speech recognition system for a “silent speech interface” application, using ultrasound and optical images of the voice organ. A one-hour audiovisual speech corpus was phonetically labeled using an automatic speech alignment procedure and robust visual feature extraction techniques. HMM-based stochastic models were estimated separately on the visual and acoustic corpus. The performance of the visual speech recognition system is compared to a traditional acoustic-based recognizer.
منابع مشابه
Towards a segmental vocoder driven by ultrasound and optical images of the tongue and lips
This article presents a framework for a phonetic vocoder driven by ultrasound and optical images of the tongue and lips for a “silent speech interface” application. The system is built around an HMM-based visual phone recognition step which provides target phonetic sequences from a continuous visual observation stream. The phonetic target constrains the search for the optimal sequence of diphon...
متن کاملDevelopment of a silent speech interface driven by ultrasound and optical images of the tongue and lips
This article presents a segmental vocoder driven by ultrasound and optical images (standard CCD camera) of the tongue and lips for a “silent speech interface” application, usable either by a laryngectomized patient or for silent communication. The system is built around an audio–visual dictionary which associates visual to acoustic observations for each phonetic class. Visual features are extra...
متن کاملPhone recognition from ultrasound and optical video sequences for a silent speech interface
Latest results on continuous speech phone recognition from video observations of the tongue and lips are described in the context of an ultrasound-based silent speech interface. The study is based on a new 61-minute audiovisual database containing ultrasound sequences of the tongue as well as both frontal and lateral view of the speaker’s lips. Phonetically balanced and exhibiting good diphone ...
متن کاملICPhS 2007 Proceedings
Certain applications require the production of intelligible speech from articulatory data. This paper outlines a research program (Ouisper : Oral Ultrasound synthetIc SPEech souRce) to synthesize speech from ultrasound acquisition of the tongue movement and video sequences of the lips. Video data is used to search in a multistream corpus associating images of the vocal tract and lips with the a...
متن کاملA Visual Speech Recognition System for an Ultrasound-based Silent Speech Interface
The development of a continuous visual speech recognizer for a silent speech interface has been investigated using a visual speech corpus of ultrasound and video images of the tongue and lips. By using high-speed visual data and tied-state cross-word triphone HMMs, and including syntactic information via domain-specific language models, word-level recognition accuracy as high as 72% was achieve...
متن کامل